Incorporating Paraphrasing in Translation Memory Matching and Retrieval
نویسنده
چکیده
Current Translation Memory (TM) systems work at the surface level and lack semantic knowledge while matching. This paper presents an approach to incorporating semantic knowledge in the form of paraphrasing in matching and retrieval. Most of the TMs use Levenshtein editdistance or some variation of it. Generating additional segments based on the paraphrases available in a segment results in exponential time complexity while matching. The reason is that a particular phrase can be paraphrased in several ways and there can be several possible phrases in a segment which can be paraphrased. We propose an efficient approach to incorporating paraphrasing with edit-distance. The approach is based on greedy approximation and dynamic programming. We have obtained significant improvement in both retrieval and translation of retrieved segments for TM thresholds of 100%, 95% and 90%.
منابع مشابه
Can Translation Memories afford not to use paraphrasing?
This paper investigates to what extent the use of paraphrasing in translation memory (TM) matching and retrieval is useful for human translators. Current translation memories lack semantic knowledge like paraphrasing in matching and retrieval. Due to this, paraphrased segments are often not retrieved. Lack of semantic knowledge also results in inappropriate ranking of the retrieved segments. Gu...
متن کاملImproving translation memory fuzzy matching by paraphrasing
Computer-assisted translation (CAT) tools have become the major language technology to support and facilitate the translation process. Those kind of programs store previously translated source texts and their equivalent target texts in a database and retrieve related segments during the translation of new texts. However, most of them are based on string or word edit distance, not allowing retri...
متن کاملEnglish-Persian Plagiarism Detection based on a Semantic Approach
Plagiarism which is defined as “the wrongful appropriation of other writers’ or authors’ works and ideas without citing or informing them” poses a major challenge to knowledge spread publication. Plagiarism has been placed in four categories of direct, paraphrasing (rewriting), translation, and combinatory. This paper addresses translational plagiarism which is sometimes referred to as cross-li...
متن کاملParaphrasing Spoken Japanese for Untangling Bilingual Transfer
One of the problems in spoken language translation is the enormous variety o f expressions not found in text translation. This volume can lead to a sparse translation coverage. In order to tackle this problem, we take the practical approach of untangling slight variations in the source language before transferring a source expression to its target. We therefore discuss how eective paraphrasing ...
متن کاملParaphrasing and Translation
Usefulness of paraphrases • Paraphrases are alternative ways of conveying the same information • Useful in NLP application such as: – Generation producing paraphrases allows for the creation of more varied and fluent text – Multidocument summarization identifying paraphrases allows information repeated across documents to be condensed – Question answering paraphrasing is important when going be...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014